Canonical Least Squares Clustering on Sparse Medical Data
نویسنده
چکیده
We explore different applications of Canonical Least Squares (CLS) clustering on a corpus of sparse medical claims from a particular health insurance provider. We find that there are several reasons why most conclusions based on CLS clusters might be misleading, especially when the data is significantly sparse. We illustrate these findings by performing a number of synthetic experiments with the most focus on the sparsity issue since it has not been explored in the literature before. Based on the insights from the synthetic experiments we show how CLS clustering can be potentially applied to identify hospital peer-groups: hospitals that share similar operational characteristics. In addition we demonstrate that CLS clustering can be used to improve prediction results for the patients length of stay in the hospitals.
منابع مشابه
Sparse Nonnegative Matrix Factorization for Clustering
Properties of Nonnegative Matrix Factorization (NMF) as a clustering method are studied by relating its formulation to other methods such as K-means clustering. We show how interpreting the objective function of K-means as that of a lower rank approximation with special constraints allows comparisons between the constraints of NMF and K-means and provides the insight that some constraints can b...
متن کاملLearning Mixtures of Multi-Output Regression Models by Correlation Clustering for Multi-View Data
In many datasets, different parts of the data may have their own patterns of correlation, a structure that can be modeled as a mixture of local linear correlation models. The task of finding these mixtures is known as correlation clustering. In this work, we propose a linear correlation clustering method for datasets whose features are pre-divided into two views. The method, called Canonical Le...
متن کاملSubspace Clustering Reloaded: Sparse vs. Dense Representations
State-of-the-art methods for learning unions of subspaces from a collection of data leverage sparsity to form representations of each vector in the dataset with respect to the remaining vectors in the dataset. The resulting sparse representations can be used to form a subspace affinity matrix to cluster the data into their respective subspaces. While sparsity-driven methods for subspace cluster...
متن کاملSparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares
Many practical pattern recognition problems require non-negativity constraints. For example, pixels in digital images and chemical concentrations in bioinformatics are non-negative. Non-negative matrix factorization (NMF) is a useful technique in approximating these high dimensional data. Sparse NMFs are also useful when we need to control the degree of sparseness in non-negative basis vectors ...
متن کاملA Least-Squares Unified View of PCA, LDA, CCA and Spectral Graph Methods
Over the last century Component Analysis (CA) methods such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Canonical Correlation Analysis (CCA) and Spectral Clustering (SC) have been extensively used as a feature extraction step for modeling, classification, visualization, and clustering. This paper proposes a unified framework to formulate PCA, LDA, CCA, and SC as a ...
متن کامل